NWDAF (Data) Analysis¶

The data we considered in this study is the public data for nwdaf.

The data measures the 'load' for each time 't', and this on various points determined by 'cell_id', 'cat_id' and 'pe_id'. Analysis of Time Series considers a unique time measurement for a given 't', which is not the case here. As a result, we considered a specific time serie for every 'cell_id', 'cat_id' and 'pe_id' - which makes 75 different Time Series.

The analysis below also revealed the different values associated to 'cell_id', 'cat_id' and 'pe_id', that is:

  • cell_id values are [0 1 2 3 4]
  • cat_id values are [0 1 2]
  • pe_id values are [0 1 2 3 4]
In [5]:
!wget https://raw.githubusercontent.com/sevgicansalih/nwdaf_data/master/nwdaf_data.csv
--2024-04-14 17:46:18--  https://raw.githubusercontent.com/sevgicansalih/nwdaf_data/master/nwdaf_data.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41050258 (39M) [text/plain]
Saving to: ‘nwdaf_data.csv.1’

nwdaf_data.csv.1    100%[===================>]  39.15M  1.42MB/s    in 29s     

2024-04-14 17:46:49 (1.35 MB/s) - ‘nwdaf_data.csv.1’ saved [41050258/41050258]

In [1]:
import pandas as pd
pd.options.plotting.backend = "plotly"
In [2]:
df = pd.read_csv( 'nwdaf_data.csv' )
df.info()
print( df.head() )
for col in df.columns: 
  v = df[ col ].unique()  
  print( f"   - {col} contains the following ({len( v )}) distinct values: {v}" )
print( "Evaluating the number of measurements per time values" )    
print( df.groupby( by=[ 't' ], sort=False ).agg( { 't' : 'count' } )[ 't' ] )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1296000 entries, 0 to 1295999
Data columns (total 6 columns):
 #   Column       Non-Null Count    Dtype  
---  ------       --------------    -----  
 0   t            1296000 non-null  int64  
 1   cell_id      1296000 non-null  int64  
 2   cat_id       1296000 non-null  int64  
 3   pe_id        1296000 non-null  int64  
 4   load         1296000 non-null  float64
 5   has_anomaly  1296000 non-null  int64  
dtypes: float64(1), int64(5)
memory usage: 59.3 MB
   t  cell_id  cat_id  pe_id       load  has_anomaly
0  0        0       0      0   4.997074            0
1  0        0       0      1  16.004322            0
2  0        0       0      2  52.985386            0
3  0        0       0      3   0.999767            0
4  0        0       0      4   5.000597            0
   - t contains the following (17280) distinct values: [    0     1     2 ... 17277 17278 17279]
   - cell_id contains the following (5) distinct values: [0 1 2 3 4]
   - cat_id contains the following (3) distinct values: [0 1 2]
   - pe_id contains the following (5) distinct values: [0 1 2 3 4]
   - load contains the following (1296000) distinct values: [ 4.99707406 16.0043221  52.98538596 ... 90.08669958  1.00640726
  6.02018468]
   - has_anomaly contains the following (2) distinct values: [0 1]
Evaluating the number of measurements per time values
t
0        75
1        75
2        75
3        75
4        75
         ..
17275    75
17276    75
17277    75
17278    75
17279    75
Name: t, Length: 17280, dtype: int64

Analysis of the (0,0,0) Time Serie¶

In this section we look more in detail to the data of a specific Time Serie that corresponds to cell_id=0, cat_id=0, pe_id=0.
The Time Serie provides load measurements as a function of time, and each of these measurements is labeled with has_anomaly.

Looking at the distribution of the load, we can see that the normal load ( i.e. without anomaly) is distributed around a value of 5 (4.995 - 5.005). Similarly, the load with anomaly is distributed around the categories (5.005 - 5.015), (5.165 - 5.175 and (5.315 - 5.325). As a result, traffic not around a value of 5 seems to have anomaly, and the challenge seems to be being able to determine whether time permits to determine if a load presents anomaly or not around a value of 5.

Looking also at the number of occurrences there is much more abnormal traffic. This may be challenging for anomaly detection techniques where abnormal traffic is expected to be 'exceptional'.

We also depict the distribution of normal and abnormal traffic according to time. This does not reveal any particular distribution. However it confirms the huge presence of abnormal traffic compared to normal.

Finally, we plot abnormal traffic and normal traffic measurement according to each ceel_id, cat_id and pe_id. The abnormal traffic is similarly larger than the normal traffic for every of these features.

In [3]:
## reduction 
cell_id = 0
cat_id = 0
pe_id = 0
df = df[ ( df.cell_id == cell_id ) & ( df.cat_id == cat_id ) & ( df.pe_id == pe_id ) ] 
df.head( )
Out[3]:
t cell_id cat_id pe_id load has_anomaly
0 0 0 0 0 4.997074 0
75 1 0 0 0 4.997780 0
150 2 0 0 0 5.003721 1
225 3 0 0 0 5.002989 1
300 4 0 0 0 5.009872 1

Data Distribution¶

In [4]:
import plotly.express as px
import plotly.subplots 
import kaleido
from IPython.core.display import SVG
import os
## specifically for load
fig = px.histogram( df, x='load', color='has_anomaly', 
                   title="load distribution between 'normal' and 'abnormal' traffic" )
fig.show()
In [5]:
## specifically for time
fig = px.histogram( df, x='t', color='has_anomaly', 
                   title="Distribution of 'normal' and 'abnormal' traffic accross time" )
fig.show()
In [6]:
## others
subfig_list = [px.histogram( df, x=feature, color='has_anomaly' ) for feature in [ 'cell_id', 'cat_id', 'pe_id' ] ]

fig = plotly.subplots.make_subplots( rows=1, cols=len(subfig_list) )
for subfig in subfig_list :  
  xaxis_title_text = subfig[ 'layout' ][ 'xaxis' ][ 'title' ][ 'text' ]
  yaxis_title_text = subfig[ 'layout' ][ 'yaxis' ][ 'title' ][ 'text' ]
  col = subfig_list.index( subfig ) + 1    
  for trace in range(len(subfig["data"])):
    fig.add_trace( subfig["data"][trace], row=1, col= col )
    fig.update_xaxes( title_text=xaxis_title_text, row=1, col=col )
    fig.update_yaxes( title_text=yaxis_title_text, row=1, col=col )
fig.update_layout(title_text="Distribution of *_id accross 'normal' and 'abnormal' traffic")
fig.show() 

PCA¶

The previous section we saw that the value of the load alone is not sufficient to determine whether it is abnormal or not and that probably time might be useful at that point. In this section we want to see how time and load are correlated and see if there is a better space to represent the data.

Plotting the different measurements in a 'load', 't' space even when the components are normalized does not show a clear split between abnormal and normal measurements.

The PCA reduction shows that we need to keep 2 dimensions and the representation within that space depicts the normal measurements to be in small clusters. We do show that applying a simple projection can be very promising. Now we will investigate how ML can do better.

In [7]:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.preprocessing import StandardScaler 
from sklearn.decomposition import PCA
import numpy as np

df = pd.read_csv( 'nwdaf_data.csv' )
df = df[ ( df.cell_id == 0 ) & ( df.cat_id == 0 ) & ( df.pe_id == 0 ) ] 
df[ 'has_anomaly' ] = df[ 'has_anomaly' ].map( { 0: 'Normal', 1: 'Abnormal' } )
#print( df )
## removing columns
X = df[ ['t', 'load'] ]

#Target = df[ [ 'has_anomaly' ] ][ 'has_anomaly'].map( { 0: 'Normal', 1: 'Abnormal' } )


fig = px.scatter_matrix(df,
    dimensions=X.columns,
    color="has_anomaly", title="Scatter Plot Without Transformation")
fig.show()
In [8]:
Target = df[ 'has_anomaly' ]
x_scaled = StandardScaler().fit_transform(X)
scaled_df = pd.DataFrame.from_records( x_scaled )
scaled_df[ 'has_anomaly' ] = Target.to_list()
fig = px.scatter_matrix(scaled_df,
    dimensions=[ 0, 1],
    color="has_anomaly", title="Scatter Plot of Normalized Features")
fig.show()
In [9]:
#def characterize_pca( X, filename="" ):
X = df[ ['t', 'load'] ]
pca = PCA(n_components = len( X.columns ) )
pca.fit( X ) # scaling
X_pca = pca.transform(X)
print(f"PCA Components (~base): {pca.components_}")
print(f"PCA Explained Variance (~length): {pca.explained_variance_}")
print(f"PCA Explained Variance ratio: {pca.explained_variance_ratio_}")

plt.plot(np.cumsum(pca.explained_variance_ratio_) )
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')
plt.title( "PCA Dimension reduction" )
plt.show()
PCA Components (~base): [[ 1.00000000e+00 -4.85702041e-07]
 [ 4.85702041e-07  1.00000000e+00]]
PCA Explained Variance (~length): [2.4884640e+07 1.7136686e-02]
PCA Explained Variance ratio: [9.99999999e-01 6.88645124e-10]
No description has been provided for this image
In [10]:
X_new = pca.inverse_transform(X_pca)
#print( X_new )
new_df = pd.DataFrame.from_records( X_new )
dim = new_df.columns
new_df[ 'has_anomaly' ] = Target

#print( new_df.head ) 
fig = px.scatter_matrix(new_df,
    dimensions=dim,
    color="has_anomaly", title="Scatterr Plot of Measurements in their new PCA base")
fig.show()

Measuring the performance of a model¶

Measuring the performance of a model is not that easy to characterize. In our case, we want the model to predict whether a measurmeent is 'normal' or 'abnormal'. However, the model provides a probability whether the measurment is 'normal' or 'abnormal'. From that probability, we decide with a threshold in which category the measurement falls. It is not the model that defines the threshold but us to characterize some metrics, so the threshold itself should not so much reflect the performances of the model.

More precisely, a threshold may be set to maximize or establish a balance between metrics that may be contradictive. Typically if we want to minimize the number of measurments that are detected as 'abnormal' but are in fact 'normal' (False Positive) we end up in missing some measurements that are 'abnormal' and classified them as 'normal' instead. That choice depends on what we want the model to achieve.

On the other hand, we can measure how a model globally can perform better than a random model. To do so, we measure the Area Under the Curvve (AUC) with the Curve being a Receiver Operating Characteristice (ROC) or a Precision Recall Curve.

How to Use ROC Curves and Precision-Recall Curves for Classification in Python provides a good introduction for ROC and Recall functions. In this section we provide a summary on how to apply these in our case.

ROC Curves and ROC AUC with a Random Forest Model¶

Our case is a 2 classes problem where each measurment is either 'normal' or 'abnormal' and we want to predict 'abnormal'. Classification can lead to the following cases:

  • a 'normal' measurement is calssified as 'normal'. In this case, we call this a True Negative.
  • a 'normal' measurement is classified as 'abnormal'. In this case we call it a False Positive.
  • an 'abonormal' measurement is classified as 'abnormal'. In thi scase we call it True Positive.
  • an 'abnormal' measurment is classified as 'normal'. In this case we call it a False Negative.

False Negative and True Positive are dependent on what we want to predict. In our case we want to predict 'abnormal' but if we wanted to predict 'normal' Positive and Negative would be inversed. Similarly, if we want to distinguish both 'apple' and 'banana', it seems to me there is no Positive or Negative.

So True / False is associated to the prediction and Positive / Negative is associated to the fact that we predict 'abnormal' (Positive) or 'normal' (Negative).

We thus define the following rates:

True Positive Rate (or Sensitivity) describes shows how well the models predicts the event occurs among all good predictions. In our case, this is the 'abnormal' measurements predicted as 'abnormal' over 'abnormal' measurements (predicted as 'abnormal' or 'normal').
$$True\; Positive\; Rate = \frac{True\; Positive}{True ;Positive + False\; Negative}$$

False Positive Rate (or False Alarm Rate) describes how often an event is predicted when it does not happen. In our case, that is the number of 'normal' measurements predicted as 'abnormal' over the number of 'normal' measurements (predicted as 'abnormal' or 'normal'). $$False\; Positive\; Rate = \frac{False\; Positives}{ False\; Positives + True \; Negatives}$$

Specificity complements the False Positive Rate. In our case it the number of 'normal' measurement predicted as 'normal' over the number of 'normal' measurements (predicted as 'abnormal' or 'normal').

$$Specificity = 1 - False\; Positive\; Rate $$ $$Specificity = \frac{True\; Negative}{ False\; Positives + True\; Negatives} $$

Receiver Operating Characteristice (ROC) curves represents True Positive Rate (y-axis) versus False Positive Rate (x-axis). So y-axis represents the probability of predicting the 'abnormal' traffic correctly while the y-axis represents the ability of the model to predict 'abnormal' while it is normal. By setting an acceptable False Positive (and allowing false alarms) we can detect more True Positive. In our case, the False Positve Rate represents how much we can deal we 'normal' measurements to be flagged as 'abnormal'. The cost is that we capture more 'abnormal' traffic being flagged as 'abnormal'.

Ideally we would like all 'abnormal' measurements to be predicted as 'abnormal' (True Positive = 1) and no 'normal' measuremenst being predicted as 'abnormal'. We are not in a ideal world and the ROC shows that roughly the price is 2 False Positive for an additional True Positive.

While we cannot have all True Positive and no False Positive, we need to select what is acceptable in our case and set the model with these selected parameters. Under the hodd, this means the we are able to go through all these parameters and evaluate the resulting True Positive and no False Positive for each configuration and than plot the ROC. These parameters ( named as in the code below rf_probs) are provided via the predict_and_proba function.

## creation of the model
classifier = sklearn.ensemble.RandomForestClassifier(\
                  random_state=123,
                  n_estimators=100,
                  max_depth=10 )
## training the model
classifier.fit( X=trainX , y=trainy ) 
rf_probs = classifier.predict_proba( testX )
## computing the False Positive Rate and True Positive Rate 
## for all parameters.
rf_fpr, rf_tpr, rf_threshold = sklearn.metrics.roc_curve(testy, rf_probs)

The main difference between predict and predict_and_proba is that predict is expected to return for each test input the class that has been predicted by the model, predict_and_proba returns the confidence that the model has into its prediction. That confidence is expressed in term of probablility. What Is The Difference Between predict() and predict_proba() in scikit-learn? Difference between predict and predict_proba in Sklearn.

The roc_curve function takes the probabilities (rf_probs) and the true values (testy). The function generates a threshold that is used to determine given a probability which class the test belongs to. For each corresponding threshold a True Positive Rate and False Positive Rate is computed.

The roc_auc_score function computes the AUC that is it generates the ROC curves and computes the area between the No Skill line and the one generated by the classifier. This number charactyerises the performance of the model, that is for any possible threshold being chosen.

Note that the classifier has a score function which provides the average accuracy over all labels. In our case, we have two labels so it is likely to be the

In [17]:
import os
import time_series_utils as tu
from IPython.core.display import SVG
import sklearn.ensemble

import matplotlib.pyplot as plt

# roc curve and auc
#from sklearn.datasets import make_classification
#from sklearn.linear_model import LogisticRegression
import sklearn.ensemble
import sklearn.metrics 

#from sklearn.model_selection import train_test_split
#from sklearn.metrics import roc_curve
#from sklearn.metrics import roc_auc_score

## Selecting aand formating a subset of the data
cell_id=0
cat_id=0 
pe_id=0 

# formating the dataset
data = tu.NWDAFDataSet ( cell_id=cell_id, cat_id=cat_id, pe_id=pe_id )

#regressor = sklearn.ensemble.RandomForestRegressor(\
#                  random_state=123,
#                  n_estimators=100,
#                  max_depth=10 )

classifier = sklearn.ensemble.RandomForestClassifier(\
                  random_state=123,
                  n_estimators=100,
                  max_depth=10 )

test_ratio = 20
data_train, data_test = data.get_train_test( test_ratio )

trainX = data_train.drop( columns=[ 'y' ] )
trainy = data_train[ 'y' ]

classifier.fit( X=trainX , y=trainy ) 

# generate a no skill prediction (majority class)
ns_probs = [0 for _ in range(len(data_test))]
testX = data_test.drop( columns=[ 'y' ] )
testy = data_test[ 'y' ]

# predict probabilities
rf_probs = classifier.predict_proba( testX )
rf_predictions = classifier.predict( testX )
print( "Comparing predict and predict_and_proba:" )
print( f"  -- predict: {rf_predictions}" )
print( f"  -- predict and proba: {rf_probs}" )
# keep probabilities for the positive outcome only
rf_probs = rf_probs[:, 1]

# calculate AUC
ns_auc = sklearn.metrics.roc_auc_score( testy, ns_probs)
rf_auc = sklearn.metrics.roc_auc_score( testy, rf_probs)
print( f"\nComputing ROC AUC" )
print('  - No Skill: ROC AUC=%.3f' % (ns_auc))
print('  - RandomForest: ROC AUC=%.3f\n' % (rf_auc))
print( f"  - score: {classifier.score( X=testX , y=testy )}"  ) 

# calculate roc curves
ns_fpr, ns_tpr, _ = sklearn.metrics.roc_curve(testy, ns_probs)
rf_fpr, rf_tpr, rf_thresholds = sklearn.metrics.roc_curve(testy, rf_probs)
print( f"Random Forest False Positive Rate: {rf_fpr[:5]}..." )
print( f"Random Forest True Positive Rate: {rf_tpr[:5]}..." )
print( f"Random Forest Thresholds: {rf_thresholds[:5]}..." )


# plot the roc curve for the model
plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
plt.plot(rf_fpr, rf_tpr, marker='.', label='RandomForest')
# axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
# show the legend
plt.legend()
plt.tight_layout()

# show the plot
svg_fn = os.path.join( './', "ROC_curve_randomforest_classifier.svg" )
plt.savefig(svg_fn)
plt.close()
tu.show_svg(svg_fn, width="50%")
Comparing predict and predict_and_proba:
  -- predict: [1 1 1 ... 0 0 0]
  -- predict and proba: [[0.   1.  ]
 [0.   1.  ]
 [0.   1.  ]
 ...
 [0.86 0.14]
 [0.86 0.14]
 [0.86 0.14]]

Computing ROC AUC
  - No Skill: ROC AUC=0.500
  - RandomForest: ROC AUC=0.863

  - score: 0.8177867902665121
Random Forest False Positive Rate: [0.         0.06922285 0.30725682 0.32012352 0.50437468]...
Random Forest True Positive Rate: [0.         0.72857784 0.85471811 0.85965306 0.91326454]...
Random Forest Thresholds: [ inf 1.   0.99 0.98 0.97]...
Out[17]:
No description has been provided for this image

As ROC curves are very common, sklearn provides various ways to build it in one line using RocCurveDisplay

Similarly, we can also plot it using plotly. I do prefer plotly as it enables interactions.

In [18]:
import plotly.express as px

rf_fpr, rf_tpr, rf_thresholds = sklearn.metrics.roc_curve(testy, rf_probs)
auc = sklearn.metrics.auc(rf_fpr, rf_tpr)

fig = px.area(
    x=rf_fpr, y=rf_tpr,
    title=f'ROC Curve with Plotly (AUC={auc:.4f})',
    labels=dict(x='False Positive Rate', y='True Positive Rate'),
    width=700, height=500
)
fig.add_shape(
    type='line', line=dict(dash='dash'),
    x0=0, x1=1, y0=0, y1=1
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.update_xaxes(constrain='domain')
fig.show()

svg_fn = "ROC_curve_using_RocCurveDisplay.svg"
if os.path.isfile( svg_fn ) is False:
  fig, ax = plt.subplots(figsize=(12, 4), nrows=1, ncols=3 )
  disp = sklearn.metrics.RocCurveDisplay(fpr=rf_fpr, tpr=rf_tpr, roc_auc=rf_auc,
                                         estimator_name='Random Forest Estimator' )
  disp.plot( ax=ax[ 0 ] )
  ax[ 0 ].set_title( "RocCurveDisplay" )
  disp = sklearn.metrics.RocCurveDisplay.from_predictions(testy, rf_predictions )
  disp.plot( ax=ax[ 1 ] )
  ax[ 1 ].set_title( "from_predictions" )
  disp  = sklearn.metrics.RocCurveDisplay.from_estimator( classifier, testX, testy )
  disp.plot( ax=ax[ 2 ] )
  ax[ 2 ].set_title( "from_estimator" )
  fig.suptitle("ROC Curves with RocCurveDisplay" )  
  plt.tight_layout()
  fig.savefig( svg_fn )
  plt.close( fig )
  #plt.show()

tu.show_svg("ROC_curve_using_RocCurveDisplay.svg" )
Out[18]:
No description has been provided for this image

Precision-Recall Curves and Precision-Recall AUC¶

Precision estimates the proportion of True Positive among all Positive predictions. That is when the measurement is classified as 'abnormal' how can we be certain it is 'abnormal'. $$Precision = \frac{True\; Positives}{True\; Positives + False\; Positives}$$

It differs from the True Positive Rate which mostly care on how much of the 'abnormal' measurements are appropriately classified. The set is determine BEFORE the classifier runs, while Precision defines the set of measurements to consider AFTER the classifier has run.

Recal or Sensitivity measures how much of the classification misses its target. In ourcase that is the 'abnormal' classified as 'abnormal' over the 'abnormal' (classified as 'abnormal' and 'normal' ).

$$Recall = \frac{True\; Positives}{True Positives + False Negatives}$$

As Precision - Recall function are mostly focused on the output of the classifier, Recal function are less influenced by unbalanced data.

In our case, our measurements are unblanced, so the Precision-Recall function may be more relevant.

Ideally, we would like classifier to be very reliable with a Precision to be 1 (with no False Positive) and Recall to be 1 (with no False negative). Unfortunately increasing the Precision requires the Recall to decrease.

In [19]:
# predict class values
predictions = classifier.predict(testX)
print( "Comparing predict and predict_and_proba:" )
print( f"  - Predictions (predict): {predictions[:10]}...")
print( f"  - Predictions (predict_and_proba): {rf_probs[:10]}...\n")

rf_precision, rf_recall, rf_threshold = sklearn.metrics.precision_recall_curve(testy, rf_probs)
print( "Displaying precision and recall:" )
print( f"  - Random Forest precision: {rf_precision[:10]}..." )
print( f"  - Random Forest Recall: {rf_recall[:10]}...\n" )

print( f"Computing Recal-Precision AUC" )
rf_auc = sklearn.metrics.auc(rf_recall, rf_precision)
print('  - RandomForest: auc=%.3f\n' % (rf_auc))

# plot the precision-recall curves
no_skill = len(testy[testy==1]) / len(testy)
plt.plot([0, 1], [no_skill, no_skill], linestyle='--', label='No Skill')
plt.plot(rf_recall, rf_precision, marker='.', label='RandomForest')
# axis labels
plt.xlabel('Recall')
plt.ylabel('Precision')
# show the legend
plt.legend()
# show the plot
# show the plot
svg_fn = os.path.join( './', "ROC_PR_curve_randomforest_classifier.svg" )
plt.savefig(svg_fn)
plt.close()
tu.show_svg(svg_fn, width="50%")
Comparing predict and predict_and_proba:
  - Predictions (predict): [1 1 1 1 1 1 1 1 1 1]...
  - Predictions (predict_and_proba): [1.   1.   1.   1.   1.   1.   0.99 0.99 0.97 0.97]...

Displaying precision and recall:
  - Random Forest precision: [0.77485516 0.84984266 0.85076469 0.85235407 0.85898334 0.8597193
 0.86171864 0.90236245 0.90542574 0.97313492]...
  - Random Forest Recall: [1.         0.92889188 0.92754598 0.925901   0.91730223 0.91603111
 0.91326454 0.85965306 0.85471811 0.72857784]...

Computing Recal-Precision AUC
  - RandomForest: auc=0.960

Out[19]:
No description has been provided for this image

Similarly to the ROC Curves, the Precision Recal Curve may be plot using
PrecisionRecallDisplay or plotly.

In [23]:
rf_precision, rf_recall, rf_threshold = sklearn.metrics.precision_recall_curve(testy, rf_probs)
rf_auc = sklearn.metrics.auc(rf_recall, rf_precision)

fig = px.area(
    x=rf_recall, y=rf_precision,
    title=f'Precision-Recall Curve with Plotly (AUC={rf_auc:.4f})',
    labels=dict(x='Recall', y='Precision'),
    width=700, height=500
)
fig.add_shape(
    type='line', line=dict(dash='dash'),
    x0=0, x1=1, y0=1, y1=0
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.update_xaxes(constrain='domain')

fig.show()
# show the plot

svg_fn = "PrecisionRecallDisplay_randomforest_classifier.svg"
if os.path.isfile( svg_fn ) is False:
  fig, ax = plt.subplots(figsize=(12, 4), nrows=1, ncols=3 )
  disp =  sklearn.metrics.PrecisionRecallDisplay(precision=rf_precision, recall=rf_recall )
  ax[ 0 ].set_title( "PrecisionRecallDisplay") 
  disp.plot( ax=ax[ 0 ] )
  disp = sklearn.metrics.PrecisionRecallDisplay.from_predictions(testy, predictions )
  ax[ 1 ].set_title( "from_predictions" )
  disp.plot( ax=ax[ 1 ] )
  disp = sklearn.metrics.PrecisionRecallDisplay.from_estimator( classifier, testX, testy )
  ax[ 2 ].set_title( "from_predictions" )
  disp.plot( ax=ax[ 2 ] )
  fig.suptitle("Precision Recall ROC Curves with PrecisionRecallDisplay" )  
  plt.tight_layout()
  fig.savefig( svg_fn )
  plt.close( fig )
tu.show_svg(svg_fn )      
Out[23]:
No description has been provided for this image

ROC AUC and Precision Recall AUC for Forecaster¶

Now that we have understood how to evaluate globally a probabilistic models (indpeendently of the threshold), we look at how we can apply the AUC to forecasters - which is what we are using for time prediction.

The situation is a bit different. In the RandomForest model, the claasifier is trained with some trainX and trainy. To test the classifier, we provide testX and the classifier outputs a prediction. The prediction can be a probability for each tested element to be a probability from which we can infer a ROC or Precision-Recall curve. The more value we have from testX the more we can evaluate accuractely the various True/False Positive/Negative Rate as these are directly derived by comparing the predictions and true value (testy). The position of a measurement in testX is not expected to impact the prediction.

Forecaster are a bit different as their goal is to predict the next steps from a training data trainy. Note that in the begining the forcaster ONLY considers time to determine whether the traffic is 'normal' or 'abnormal'. The more steps that has to be guessed, the more error will happen as at one point the steps will be based on predicted steps. If we go too far, the guess is very likely to be the most common occurence of 'normal' or 'abnormal'. In our case, we do have more 'abnormal' occurence, as a result, when we can no more predict, our best guess is to bet 'abnormal'. At that point we cannot claim we are performing any prediction and this is the point where predictions with a huge number of steps will converge. As a result, having large ( testX or actually testy ) may lower down the difference between the different forecasters.

The forecaster sees the trainX as a succession of '0' and '1', it will use the recurrence of these event over the last lag steps. The forecaster does not interprete these '0' and '1' as classes, but as numbers and the predictions can be considered as probabilities to take the value '1'. At least this is how we will consider these predictions.

There are a number of question we want to answer:

  1. What is the performance of the model ?
  2. How much lag does the model needs to predict ?
  3. How much step can the model predict ?

The performance of the model will be evaluated with AUC for various values of lag and a limited number of testy. The nmumber of testy will be defined as just large enough to exhaust the predicition, that is in our case sufficient for the prediction to always predict 'abnormal'. The ideal number for lag is the one that maximizes the AUC or the one after which increasing the number do not provide so much benefit to the AUC. Here "so much benefit" can be interpreted as the measured benefit do not match the extra work associated to a large lag value.

In [24]:
#import skforecast.ForecasterAutoreg
import plotly.express as px
import sklearn
import skforecast
import time_series_utils as tu
#from IPython.core.display import SVG

cell_id=0
cat_id=0 
pe_id=0 

# formating the dataset
data = tu.NWDAFDataSet ( cell_id=cell_id, cat_id=cat_id, pe_id=pe_id )

train_len = int( 80 * len( data.df ) / 100 )  
data_train, data_test = data.get_train_test( train_len )

#trainX = data_train.drop( columns=[ 'y' ] )
trainy = data_train[ 'y' ]
print( f"Long Term Prediction:" )
print( f"  - 'abnormal': {len( trainy[ trainy==1 ] ) / len( trainy ) * 100:.4f}%" )
print( f"  - 'normal': {len( trainy[ trainy==0 ] ) / len( trainy ) * 100:.4f}%" )

#testX = data_test.drop( columns=[ 'y' ] )
testy = data_test[ 'y' ]

regressor = sklearn.ensemble.RandomForestRegressor(\
                  random_state=123,
                  n_estimators=100,
                  max_depth=10 )
lag = 400
forecaster = skforecast.ForecasterAutoreg.ForecasterAutoreg(
                   regressor = regressor,
                   lags      = lag
               )

#print( f"trainy: {trainy[ -100 : ] }" )
forecaster.fit( y=trainy )

steps =  int( 1 * len( testy ) )

if steps >  len( testy ) :
  raise ValueError( f"steps {steps} is too high. Must be lower than {len( testy )}" ) 

f_predictions = forecaster.predict( steps=steps )
#print( f"  - Predictions (predict): {f_predictions[:10]}...")

f_precision, f_recall, f_threshold = sklearn.metrics.precision_recall_curve(testy[ : steps ], f_predictions )
print( f" predictions: {f_predictions[-100:]}" ) 
print( "Displaying precision and recall:" )
print( f"  - Random Forest precision: {f_precision[:10]}..." )
print( f"  - Random Forest Recall: {f_recall[:10]}...\n" )

print( "Computing AUC" )
print(f"  - Precision-Recall AUC: {sklearn.metrics.auc(f_recall, f_precision):.4f}")
print(f"  - ROC AUC: {sklearn.metrics.roc_auc_score( testy[ : steps ], f_predictions):.4f}")

fig = px.line( testy[ : steps ] ) 
#fig = px.line( tu_predictions[ : steps ] ) 
fig.add_scatter(x=f_predictions.index[ : steps ], y=f_predictions[ : steps ],name='predictions' )
#fig.add_scatter(x=tu_predictions.index[ : steps ], y=tu_predictions[ : steps ] )
fig.update_layout( 
    title='Comparison between predictions and expected values',
    xaxis_title='Date',
    yaxis_title='Percentage')
fig.show()
Long Term Prediction:
  - 'abnormal': 77.0616%
  - 'normal': 22.9384%
 predictions: 2024-01-01 04:46:20    1.0
2024-01-01 04:46:21    1.0
2024-01-01 04:46:22    1.0
2024-01-01 04:46:23    1.0
2024-01-01 04:46:24    1.0
                      ... 
2024-01-01 04:47:55    1.0
2024-01-01 04:47:56    1.0
2024-01-01 04:47:57    1.0
2024-01-01 04:47:58    1.0
2024-01-01 04:47:59    1.0
Freq: S, Name: pred, Length: 100, dtype: float64
Displaying precision and recall:
  - Random Forest precision: [0.79253472 0.79247467 0.79270411 0.79264408 0.79258401 0.79252391
 0.79275362 0.79298347 0.79321346 0.79315347]...
  - Random Forest Recall: [1.         0.9996349  0.9996349  0.99926981 0.99890471 0.99853961
 0.99853961 0.99853961 0.99853961 0.99817452]...

Computing AUC
  - Precision-Recall AUC: 0.8890
  - ROC AUC: 0.5662

Application Measuring and Selecting Forecaster¶

In the remaining of the section we will compare various Forecaster that is Autoreg and AutoregWithExog each combined with two different regressors a RandomForest regressor and Ridge, a linear regressor.

We wanted to select the appropriated number of lag used by the Forecaster and then measure their performaces. The number of lag is defined as a compromise between finding th enumber of lags that provide the highest PR AUC. On the other hand, we also wanted to limit the number of lags so that the fiting time does not exceeds 60 seconds. The fit and prediction time represents the necessary time it took to train or predict the train_test data set with tha model. Time measurement is performed on the same computer (Intel NUC) but we cannot guarantee the exact same load. Variations might result from a combination of model computation under performing due to some random choices as well as the NUC being load differently.

When we want to find an optimal value, we also want these values remain quite stable when the context evolves. In that context, we consider that the selected lag needs to provide similar metrics when the number of lag varies. We assume that that a variation of lag coudl reflect a variation of the context.

The primary metric we use is the PR AUC as it reflects how much the model outperforms a random prediction. Except for AutoregWithExog / Ridge, PR AUC is more or less a constant value over which we do have some oscilations. In that case, the parameters are chosen in order to increase the stability of the prediction. In other words, we want to minimize the impact of a small perturbation over the prediction. The variation of lag is one of this perturbation, and we proceed as follows: For each window of 100 lags, we compute the AUC for each lag and compute the L∞ distance that measures the variation of the PR AUC over each lag window. These variations are sorted and we select the windows that offer the smallest variation with afit time that remains below 60 secondes. The selected lag is the lag at the center of the window.

Once the lag has been selected, the Forecaster is defined and we display measuremenst associated to the forecatser inclusing the scores performed by various metrics. Si ilarly, the metrics are only considered when these are associated with some sort of stability and we consider metrics as non Applicable when such distance is too high.

For AutoregWithExog / Ridge, PR AUC is a decreasing function with the number of lags, so we really have to balance the stability versus the PAR AUC performances. In fact PAR AUC variations become much more stable for lag values greater than 850 which also correspond to low PR AUC values. In fact searching the optimum PR AUC as described in the randomforest case provides 1502 lags with a PR AUC mean value of 0.81 and a L∞=0.00 while we coudl expect PR AUC values greater than 0.9 for a number of lags lower than 500.

In addition, optimum scores for of number of lags greater than 750 are significantly lower (0.3 versus 0.8 for f1 binary for example). The optimum threshold are however not stable and further studies are probably needed to look how such threshold variations may impact the scores or PR AUC. As a result, we reduce the acceptable lag values to [0, 250] to apply the previous method.

The performances obtained for the Forecaster / Regressor are summurized below:

Forcaster / Regressor Autoreg / RandForest A. WithExog / RF Autoreg / Ridge A. WithExog / Ridge
optimum lag 1404 94 1722 196
optimum PR AUC: ~=0.87, L∞=0.04 ~=0.95, L∞=0.02 ~=0.79, L∞=0.00 ~=0.96, L∞=0.00
selected_lag: 412 +/- 100 94 +/- 100 1722 +/- 100 196 +/- 100
predition time (s) ~=38.95, L∞=31.31 ~=41.77, L∞=41.16 ~=1.03, L∞=4.29 ~=1.54, L∞=6.01
fit time (s) ~=51.97, L∞=44.48, coef=0.15 s/lag ~=10.69, L∞=18.50, coef=0.09 s/lag ~=8.26, L∞=9.92, coef=-0.002 s/lag ~=0.59, L∞=2.47, coef=0.00 s/lag
PR AUC ~=0.87, L∞=0.07 ~=0.95, L∞=0.02 ~=0.79, L∞=0.00 ~=0.96, L∞=0.00
Mean Square Error ~=0.22, L∞=0.08 ~=0.17, L∞=0.03 ~=0.17, L∞=0.00 ~=0.13, L∞=0.00
Mean Absolute Error ~=0.28, L∞=0.19 ~=0.19, L∞=0.03 ~=0.34, L∞=0.00 ~=0.31, L∞=0.01
R2 Score ~=-0.35, L∞=0.47 ~=-0.05, L∞=0.18 ~=-0.04, L∞=0.01 ~=0.18, L∞=0.02
steps Prediction 530.00 3455.00 steps 3455.00 steps 3455.00 steps
f1_binary (Thres) ~=0.13, L∞=0.26 ~=0.17, L∞=0.97 ~=0.32, L∞=0.13 ~=0.40, L∞=0.27
(Score) ~=0.88, L∞=0.00 ~=0.88, L∞=0.01 ~=0.88, L∞=0.00 ~=0.75, L∞=0.04
f1_micro (Thres) ~=0.14, L∞=0.35 ~=0.18, L∞=0.97 ~=0.32, L∞=0.13 ~=0.42, L∞=0.19
(Score) ~=0.79, L∞=0.01 ~=0.79, L∞=0.02 ~=0.78, L∞=0.01 ~=0.60, L∞=0.06
accuracy (Thres) ~=0.14, L∞=0.35 ~=0.18, L∞=0.97 ~=0.32, L∞=0.13 ~=0.42, L∞=0.19
(Score) ~=0.79, L∞=0.01 ~=0.79, L∞=0.02 ~=0.78, L∞=0.01 ~=0.60, L∞=0.06
precision_binary (Thres) ~=0.83, L∞=0.86 ~=0.53, L∞=0.99 ~=0.98, L∞=0.24 ~=0.89, L∞=0.05
(Score) ~=0.82, L∞=0.12 ~=0.87, L∞=0.21 ~=0.97, L∞=0.19 ~=1.00, L∞=0.00
precision_micro (Thres) ~=0.14, L∞=0.35 ~=0.18, L∞=0.97 ~=0.32, L∞=0.13 ~=0.42, L∞=0.19
(Score) ~=0.79, L∞=0.01 ~=0.79, L∞=0.02 ~=0.78, L∞=0.01 ~=0.60, L∞=0.06
average_precision_micro (Thres) ~=0.76, L∞=0.86 ~=0.38, L∞=0.99 ~=0.73, L∞=0.08 ~=0.70, L∞=0.08
(Score) ~=0.81, L∞=0.03 ~=0.80, L∞=0.03 ~=0.80, L∞=0.01 ~=0.86, L∞=0.03
recall_binary (Thres) ~=0.12, L∞=0.17 ~=0.16, L∞=0.97 ~=0.28, L∞=0.03 ~=0.25, L∞=0.11
(Score) ~=1.00, L∞=0.00 ~=1.00, L∞=0.03 ~=0.99, L∞=0.01 ~=0.75, L∞=0.06
recall_micro (Thres) ~=0.14, L∞=0.35 ~=0.18, L∞=0.97 ~=0.32, L∞=0.13 ~=0.42, L∞=0.19
(Score) ~=0.79, L∞=0.01 ~=0.79, L∞=0.02 ~=0.78, L∞=0.01 ~=0.60, L∞=0.06

Note that we have considered the stability of the threshold versus lag variation. This comes under the assumption that a stable threshold would provide a stable optimum score. However this has not been checked explicitly and it could also be that a threshold variation does not too largely affect the score. While this coudl hardly be checked on every possible model, we can check that explicitly to the selected four models. To do so, we will plot the variation of threshold from the optimum threshold affects the score, that is x=(threshold - threshold_optimum), y=(score - score_optimum).

Step prediction is also as currenlty being measured in a very subjective way and we shoudl rather explicitly measure the scores for various test_set; that is x=steps, y=score( test_set[ : steps] ).

ForecasterAutoreg / RandomForestRegressor¶

In [3]:
import time_series_utils as tu
import sklearn
from IPython.core.display import SVG


cell_id=0
cat_id=0 
pe_id=0 

# formating the dataset
data = tu.NWDAFDataSet ( cell_id=cell_id, cat_id=cat_id, pe_id=pe_id )
train_len = int( 80 * len( data.df ) / 100 )  
test_len = len( data.df ) - train_len

print( f"test_len : {test_len}" )
regressor = sklearn.ensemble.RandomForestRegressor(\
                  random_state=123,
                  n_estimators=100,
                  max_depth=10 )

forecaster_eval = tu.TimeSerieLagAnalysis( data, train_len, regressor,
                                          forecaster = 'ForecasterAutoreg')
## plotting AUC, and other performances of the model versus lag
lag_min = 2
lag_max = 2000
lag_step = 2
df_auc, svg = forecaster_eval.plot_lag_forcaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
test_len : 3456
Storing results in cell_id-0--cat_id-0--pe_id-0
Out[3]:
No description has been provided for this image
In [4]:
## selecting the most appropriated lag minimizing L∞ over selected_lag +/- delta_lag
## In addition, we want the training model to be lower than 60 seconds.
delta_lag = 100
max_fit_time = 60
selected_lag = forecaster_eval.select_lag_from_forcaster_evaluation( \
    df_auc, delta_lag=delta_lag, max_fit_time=max_fit_time)
L∞ AUC for 100 lag interval: [(1404, 0.0444930232902323), (1406, 0.0444930232902323), (1408, 0.0444930232902323), (1410, 0.04624537516089111), (1412, 0.04624537516089111), (1414, 0.04624537516089111), (1416, 0.04624537516089111), (1418, 0.04624537516089111), (1420, 0.04624537516089111), (1438, 0.04636914315729246), (1440, 0.04636914315729246), (1442, 0.04636914315729246), (1422, 0.04691484193588302), (1424, 0.04691484193588302), (1426, 0.04691484193588302), (1428, 0.04691484193588302), (1430, 0.04691484193588302), (1432, 0.04691484193588302), (1434, 0.04691484193588302), (1436, 0.04691484193588302)]...
  - optimum lag: 1404
  - optimum PR AUC:  ~=0.87, L∞=0.04
  - selected_lag: 412 +/- 100 -- optimizing PR AUC with fit time < 60
Information for model with lag: 412 +/- 100
  - predition time (s):  ~=38.95, L∞=31.31
  - fit time (s):  ~=51.97, L∞=44.48 -- Coefficient [[0.15129642]] s/lag
  - PR AUC:  ~=0.87, L∞=0.07
  - Mean Square Error :  ~=0.22, L∞=0.08
  - Mean Absolute Error :  ~=0.28, L∞=0.19
  - R2 Score :  ~=-0.35, L∞=0.47
  - steps Prediction : 530.00 steps
In [5]:
## plotting scores of the thresholed forecaster
lag_min = 1
lag_max = 800
lag_step = 1
df_score, svg = forecaster_eval.plot_thresholded_forecaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Out[5]:
No description has been provided for this image
In [6]:
## displaying the scores associated to the selected lag interval
sub_df_score = df_score[ ( df_score[ 'lag' ] >= selected_lag - delta_lag ) & ( df_score['lag'] <= selected_lag + delta_lag ) ]
print( f"Scores for lags: {selected_lag} +/- {delta_lag}\n" )
forecaster_eval.print_mMm_scores( sub_df_score, threshold_var=0.25, score_var=0.25 )
Scores for lags: 412 +/- 100


Accepted Metrics:
  - recall_binary:[Score:  ~=1.00, L∞=0.00][Threshold:  ~=0.12, L∞=0.17]

Rejected Metrics (Too high variation):
  - f1_binary: [Score:  ~=0.88, L∞=0.00][Threshold:  ~=0.13, L∞=0.26]
  - f1_micro: [Score:  ~=0.79, L∞=0.01][Threshold:  ~=0.14, L∞=0.35]
  - accuracy: [Score:  ~=0.79, L∞=0.01][Threshold:  ~=0.14, L∞=0.35]
  - precision_binary: [Score:  ~=0.82, L∞=0.12][Threshold:  ~=0.83, L∞=0.86]
  - precision_micro: [Score:  ~=0.79, L∞=0.01][Threshold:  ~=0.14, L∞=0.35]
  - average_precision_micro: [Score:  ~=0.81, L∞=0.03][Threshold:  ~=0.76, L∞=0.86]
  - recall_micro: [Score:  ~=0.79, L∞=0.01][Threshold:  ~=0.14, L∞=0.35]

Prediction seem sto be performed at constant time, while fit time seems to rougly increase linearly with the number lag with some high variations. We are looking a minimizing the number of lag in order to limit the fit time as the forecaster will be updated regularly.

As our data set is unbalanced, we only consider the PR AUC. For lags <= 350, AUC cannot be quantified. For lag >= 350, The PR AUC oscillates between contained values that makes possible to quantify PR AUC. A similar behavior is observed for the mean square error, mean absolute error as well as teh R2 score.

The number of predicted steps shows a very high variation. The way we measure the numbe rof predicted steps is actually the number of steps the model does not which to exactly "1". Such measure is quite liberal and we may rather consider the line that remains below the predicted steps.

A threshold is only valuable if 1) it provides a certain stability regarding lag and 2) the metric it optimize remain also quite stable. The reason we are looking for such stability is that we assume that the traffic may evolve over time and some variation of the lag represents one way the "same" traffic may present different variations. Metric like "precision binary", "average_precision" provide threshold with too much variations. Metrcis like "f1_binary", "recall_binary", "recall_micro" seems acceptable.

ForecasterAutoregWithExog / RandomForestRegressor¶

In [7]:
## defining the new forecaster to eval 
forecaster_eval = tu.TimeSerieLagAnalysis( data, train_len, regressor,
                                          forecaster = 'ForecasterAutoregWithExog')
## plotting AUC, and other performances of the model versus lag
lag_min = 2
lag_max = 2000
lag_step = 2
df_auc, svg = forecaster_eval.plot_lag_forcaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Storing results in cell_id-0--cat_id-0--pe_id-0
Out[7]:
No description has been provided for this image
In [8]:
## selecting the most appropriated lag minimizing L∞ over selected_lag +/- delta_lag
## In addition, we want the training model to be lower than 60 seconds.
delta_lag = 100
max_fit_time = 60
selected_lag = forecaster_eval.select_lag_from_forcaster_evaluation( \
    df_auc, delta_lag=delta_lag, max_fit_time=max_fit_time)
L∞ AUC for 100 lag interval: [(94, 0.01737904533271084), (96, 0.01737904533271084), (98, 0.01737904533271084), (92, 0.018672576592927537), (1800, 0.021179071917753478), (1768, 0.022149741269181566), (1770, 0.022149741269181566), (1772, 0.022149741269181566), (1774, 0.022149741269181566), (1776, 0.022149741269181566), (1778, 0.022149741269181566), (1780, 0.022149741269181566), (1782, 0.022149741269181566), (1784, 0.022149741269181566), (1786, 0.022149741269181566), (1788, 0.022149741269181566), (1790, 0.022149741269181566), (1792, 0.022149741269181566), (1794, 0.022149741269181566), (1796, 0.022149741269181566)]...
  - optimum lag: 94
  - optimum PR AUC:  ~=0.95, L∞=0.02
  - selected_lag: 94 +/- 100 -- optimizing PR AUC with fit time < 60
Information for model with lag: 94 +/- 100
  - predition time (s):  ~=41.77, L∞=41.16
  - fit time (s):  ~=10.69, L∞=18.50 -- Coefficient [[0.09516118]] s/lag
  - PR AUC:  ~=0.95, L∞=0.02
  - Mean Square Error :  ~=0.17, L∞=0.03
  - Mean Absolute Error :  ~=0.19, L∞=0.03
  - R2 Score :  ~=-0.05, L∞=0.18
  - steps Prediction : 3455.00 steps
In [9]:
## plotting scores of the thresholed forecaster
lag_min = 2
lag_max = 800
lag_step = 2
score_df, svg = forecaster_eval.plot_thresholded_forecaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Out[9]:
No description has been provided for this image
In [10]:
## displaying the scores associated to the selected lag interval
sub_df_score = df_score[ ( df_score[ 'lag' ] >= selected_lag - delta_lag ) & ( df_score['lag'] <= selected_lag + delta_lag ) ]
print( f"Scores for lags: {selected_lag} +/- {delta_lag}\n" )
forecaster_eval.print_mMm_scores( sub_df_score, threshold_var=0.25, score_var=0.25 )
Scores for lags: 94 +/- 100


Accepted Metrics:

Rejected Metrics (Too high variation):
  - f1_binary: [Score:  ~=0.88, L∞=0.01][Threshold:  ~=0.17, L∞=0.97]
  - f1_micro: [Score:  ~=0.79, L∞=0.02][Threshold:  ~=0.18, L∞=0.97]
  - accuracy: [Score:  ~=0.79, L∞=0.02][Threshold:  ~=0.18, L∞=0.97]
  - precision_binary: [Score:  ~=0.87, L∞=0.21][Threshold:  ~=0.53, L∞=0.99]
  - precision_micro: [Score:  ~=0.79, L∞=0.02][Threshold:  ~=0.18, L∞=0.97]
  - average_precision_micro: [Score:  ~=0.80, L∞=0.03][Threshold:  ~=0.38, L∞=0.99]
  - recall_binary: [Score:  ~=1.00, L∞=0.03][Threshold:  ~=0.16, L∞=0.97]
  - recall_micro: [Score:  ~=0.79, L∞=0.02][Threshold:  ~=0.18, L∞=0.97]

ForecasterAutoreg / Ridge¶

In [11]:
## defining the new forecaster to eval 
regressor = sklearn.linear_model.Ridge(random_state=123)
forecaster_eval = tu.TimeSerieLagAnalysis( data, train_len, regressor,
                                          forecaster = 'ForecasterAutoreg')
## plotting AUC, and other performances of the model versus lag
lag_min = 2
lag_max = 2000
lag_step = 2
df_auc, svg = forecaster_eval.plot_lag_forcaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Storing results in cell_id-0--cat_id-0--pe_id-0
Out[11]:
No description has been provided for this image
In [12]:
## selecting the most appropriated lag minimizing L∞ over selected_lag +/- delta_lag
## In addition, we want the training model to be lower than 60 seconds.
delta_lag = 100
max_fit_time = 60
selected_lag = forecaster_eval.select_lag_from_forcaster_evaluation( \
    df_auc, delta_lag=delta_lag, max_fit_time=max_fit_time)
L∞ AUC for 100 lag interval: [(1722, 0.0039330912252075745), (1724, 0.0039330912252075745), (1914, 0.004522821858182202), (1916, 0.004522821858182202), (1918, 0.004522821858182202), (1920, 0.004522821858182202), (1922, 0.004522821858182202), (1924, 0.004522821858182202), (1926, 0.004522821858182202), (1928, 0.004522821858182202), (1930, 0.004522821858182202), (1932, 0.004522821858182202), (1934, 0.004522821858182202), (1936, 0.004522821858182202), (1938, 0.004522821858182202), (1720, 0.004562687039216806), (1908, 0.004652269848079205), (1910, 0.004652269848079205), (1912, 0.004652269848079205), (1700, 0.0046752457000522885)]...
  - optimum lag: 1722
  - optimum PR AUC:  ~=0.79, L∞=0.00
  - selected_lag: 1722 +/- 100 -- optimizing PR AUC with fit time < 60
Information for model with lag: 1722 +/- 100
  - predition time (s):  ~=1.03, L∞=4.29
  - fit time (s):  ~=8.26, L∞=9.92 -- Coefficient [[-0.00227267]] s/lag
  - PR AUC:  ~=0.79, L∞=0.00
  - Mean Square Error :  ~=0.17, L∞=0.00
  - Mean Absolute Error :  ~=0.34, L∞=0.00
  - R2 Score :  ~=-0.04, L∞=0.01
  - steps Prediction : 3455.00 steps
In [13]:
## plotting scores of the thresholed forecaster
lag_min = 2
lag_max = 2000
lag_step = 2
df_score, svg = forecaster_eval.plot_thresholded_forecaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Out[13]:
No description has been provided for this image
In [14]:
## displaying the scores associated to the selected lag interval
sub_df_score = df_score[ ( df_score[ 'lag' ] >= selected_lag - delta_lag ) & ( df_score['lag'] <= selected_lag + delta_lag ) ]
print( f"Scores for lags: {selected_lag} +/- {delta_lag}\n" )
forecaster_eval.print_mMm_scores( sub_df_score, threshold_var=0.25, score_var=0.25 )
Scores for lags: 1722 +/- 100


Accepted Metrics:
  - f1_binary:[Score:  ~=0.88, L∞=0.00][Threshold:  ~=0.32, L∞=0.13]
  - f1_micro:[Score:  ~=0.78, L∞=0.01][Threshold:  ~=0.32, L∞=0.13]
  - accuracy:[Score:  ~=0.78, L∞=0.01][Threshold:  ~=0.32, L∞=0.13]
  - precision_binary:[Score:  ~=0.97, L∞=0.19][Threshold:  ~=0.98, L∞=0.24]
  - precision_micro:[Score:  ~=0.78, L∞=0.01][Threshold:  ~=0.32, L∞=0.13]
  - average_precision_micro:[Score:  ~=0.80, L∞=0.01][Threshold:  ~=0.73, L∞=0.08]
  - recall_binary:[Score:  ~=0.99, L∞=0.01][Threshold:  ~=0.28, L∞=0.03]
  - recall_micro:[Score:  ~=0.78, L∞=0.01][Threshold:  ~=0.32, L∞=0.13]

Rejected Metrics (Too high variation):

ForecasterAutoregWithExog / Ridge¶

In [15]:
## defining the new forecaster to eval 
forecaster_eval = tu.TimeSerieLagAnalysis( data, train_len, regressor,
                                          forecaster = 'ForecasterAutoregWithExog')
## plotting AUC, and other performances of the model versus lag
lag_min = 2
lag_max = 2000
lag_step = 2
df_auc, svg = forecaster_eval.plot_lag_forcaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Storing results in cell_id-0--cat_id-0--pe_id-0
Out[15]:
No description has been provided for this image
In [16]:
## selecting the most appropriated lag minimizing L∞ over selected_lag +/- delta_lag
## In addition, we want the training model to be lower than 60 seconds.
delta_lag = 100
max_fit_time = 60
selected_lag = forecaster_eval.select_lag_from_forcaster_evaluation( \
    df_auc, delta_lag=delta_lag, max_fit_time=max_fit_time)
L∞ AUC for 100 lag interval: [(1502, 0.0018713483211346693), (1504, 0.0018713483211346693), (1506, 0.0018713483211346693), (1508, 0.0018713483211346693), (1510, 0.0018713483211346693), (1512, 0.0018713483211346693), (1514, 0.0018713483211346693), (1516, 0.0018713483211346693), (1518, 0.0018713483211346693), (1520, 0.0019636278967860576), (1500, 0.001985028335531158), (1522, 0.0020719451180020165), (1524, 0.0020719451180020165), (1526, 0.0020719451180020165), (1528, 0.0020719451180020165), (1530, 0.0020719451180020165), (1478, 0.002141797668210099), (1480, 0.002141797668210099), (1482, 0.002141797668210099), (1484, 0.002141797668210099)]...
  - optimum lag: 1502
  - optimum PR AUC:  ~=0.81, L∞=0.00
  - selected_lag: 1502 +/- 100 -- optimizing PR AUC with fit time < 60
Information for model with lag: 1502 +/- 100
  - predition time (s):  ~=1.94, L∞=10.71
  - fit time (s):  ~=7.92, L∞=7.79 -- Coefficient [[-0.0018251]] s/lag
  - PR AUC:  ~=0.81, L∞=0.00
  - Mean Square Error :  ~=362.97, L∞=320.57
  - Mean Absolute Error :  ~=9.69, L∞=3.81
  - R2 Score :  ~=-2206.55, L∞=1949.66
  - steps Prediction : 3455.00 steps
In [17]:
## plotting scores of the thresholed forecaster
lag_min = 2
lag_max = 2000
lag_step = 2
df_score, svg = forecaster_eval.plot_thresholded_forecaster_evaluation( lag_min, lag_max, lag_step )
tu.show_svg( svg )
Out[17]:
No description has been provided for this image

Since the PR AUC is decreasing, the selected lag value does not seems appropriated, and we force the analysis and performance within an interval of acceptable lag values. In our case, we select that interval to be [0, 250].

In [18]:
delta_lag = 100
max_fit_time = 60

print( "Selecting according to L∞ on a refined interval [ 0, 250]" )
df_auc = df_auc[ df_auc[ 'lag' ] <= 250 ]

## selecting the most appropriated lag minimizing L∞ over selected_lag +/- delta_lag
## In addition, we want the training model to be lower than 60 seconds.
selected_lag = forecaster_eval.select_lag_from_forcaster_evaluation( \
    df_auc, delta_lag=delta_lag, max_fit_time=max_fit_time)

## displaying the scores associated to the selected lag interval
sub_df_score = df_score[ ( df_score[ 'lag' ] >= selected_lag - delta_lag ) & ( df_score['lag'] <= selected_lag + delta_lag ) ]
print( f"Scores for lags: {selected_lag} +/- {delta_lag}\n" )
forecaster_eval.print_mMm_scores( sub_df_score, threshold_var=0.25, score_var=0.25 )
Selecting according to L∞ on a refined interval [ 0, 250]
L∞ AUC for 100 lag interval: [(196, 0.004701060878532948), (194, 0.004829251083797281), (192, 0.004899540260009605), (190, 0.0051161521075803185), (198, 0.005239885170289171), (188, 0.0052946262965737345), (186, 0.005879102859371321), (184, 0.006314551055556117), (176, 0.006427314917347493), (178, 0.006427314917347493), (180, 0.006427314917347493), (182, 0.006427314917347493), (160, 0.006435553752758838), (162, 0.006435553752758838), (164, 0.006435553752758838), (166, 0.006435553752758838), (168, 0.006435553752758838), (170, 0.006435553752758838), (172, 0.006435553752758838), (174, 0.006435553752758838)]...
  - optimum lag: 196
  - optimum PR AUC:  ~=0.96, L∞=0.00
  - selected_lag: 196 +/- 100 -- optimizing PR AUC with fit time < 60
Information for model with lag: 196 +/- 100
  - predition time (s):  ~=1.54, L∞=6.01
  - fit time (s):  ~=0.59, L∞=2.47 -- Coefficient [[0.00095131]] s/lag
  - PR AUC:  ~=0.96, L∞=0.00
  - Mean Square Error :  ~=0.13, L∞=0.00
  - Mean Absolute Error :  ~=0.31, L∞=0.01
  - R2 Score :  ~=0.18, L∞=0.02
  - steps Prediction : 3455.00 steps
Scores for lags: 196 +/- 100


Accepted Metrics:
  - f1_micro:[Score:  ~=0.60, L∞=0.06][Threshold:  ~=0.42, L∞=0.19]
  - accuracy:[Score:  ~=0.60, L∞=0.06][Threshold:  ~=0.42, L∞=0.19]
  - precision_binary:[Score:  ~=1.00, L∞=0.00][Threshold:  ~=0.89, L∞=0.05]
  - precision_micro:[Score:  ~=0.60, L∞=0.06][Threshold:  ~=0.42, L∞=0.19]
  - average_precision_micro:[Score:  ~=0.86, L∞=0.03][Threshold:  ~=0.70, L∞=0.08]
  - recall_binary:[Score:  ~=0.75, L∞=0.06][Threshold:  ~=0.25, L∞=0.11]
  - recall_micro:[Score:  ~=0.60, L∞=0.06][Threshold:  ~=0.42, L∞=0.19]

Rejected Metrics (Too high variation):
  - f1_binary: [Score:  ~=0.75, L∞=0.04][Threshold:  ~=0.40, L∞=0.27]

Refining The models¶

Hyperparameter tuning¶

In [ ]:
# Hyperparameter grid search
# ==============================================================================
import skforecast.model_selection
import skforecast.ForecasterAutoreg
import sklearn.ensemble

import time_series_utils as tu

## Selecting aand formating a subset of the data
cell_id = 0
cat_id = 0 
pe_id = 0 

# formating the dataset
data_set = tu.NWDAFDataSet( cell_id=cell_id, cat_id=cat_id, pe_id=pe_id )
data_train, data_test = data_set.get_train_test(test_ratio=20, verbose=True )

forecaster = skforecast.ForecasterAutoreg.ForecasterAutoreg(
                 regressor = sklearn.ensemble.RandomForestRegressor(random_state=123),
                 lags      = 12 # This value will be replaced in the grid search
             )

# Candidate values for lags
lag_min = 400
lag_max = 600
lag_step = 10
lags_grid = range( lag_min, lag_max + lag_step, lag_step)

steps = int( len( data_test ) )
print( f"checking if lags are smaller than the series: {steps}" )
for lags in lags_grid:
  if lags > steps:
    raise ValueError( "The maximum lag ({lags}) must be less than the length of the series ({steps})." )      

# Candidate values for regressor's hyperparameters
param_grid = {
    'n_estimators': [ 100 ],
    'max_depth': [8, 10, 12 ]
}

results_grid = skforecast.model_selection.grid_search_forecaster(
                   forecaster         = forecaster,
                   y                  = data_train['y'],
                   param_grid         = param_grid,
                   lags_grid          = lags_grid,
                   steps              = int( len( data_test ) ),
                   refit              = False,
                   metric             = 'mean_squared_error',
                   initial_train_size = int(len(data_train)*0.5),
                   fixed_train_size   = False,
                   return_best        = True,
                   n_jobs             = 'auto',
                   verbose            = False
               )
# Search results
# ==============================================================================
results_grid